Week 1 - Data Science Workflows
Dr Zak Varty
Data science is a collaborative discipline
Be a good collaborator, to others and to your future self
This week will show one framework to help you with that task
Like flossing not difficult but requires discipline
We will take an opinionated and R focused approach, ideas transfer to other settings.
Ideally, we would like to organise our projects so that they are:
Is your work all in one place or scattered?
Can it be moved to a new location without breaking?
What do we mean by a new location, exactly?
A study is reproducible if you can take the original data and the computer code used to analyze the data and recreate all of the numerical findings from the study.
Broman et al. (2017) “Recommendations to Funding Agencies for Supporting Reproducible Research”
Possible to code and manage projects entirely in notepad or at the command line.
Puts a lot of strain on you, both your fingers and your brain
Integrated development environments such as RStudio, PyCharm and VisualStudio aim to reduce this burden
Effective Data Science: Workflows - Organising Your Work - Zak Varty